1. Exploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA) is the process of examining data before building visualizations or conducting detailed analysis.
Before designing a visualization, understand your data first.
- Distribution of values
- Data quality issues
- Outliers
- Correlations
- Interesting subsets
- Potential relationships
EDA was introduced by John Tukey (1977).
2. Why EDA Matters
EDA helps reveal patterns hidden inside raw data.
Raw tables are difficult to interpret.
Visual exploration makes patterns easier to discover.
🧠EDA = Explore Before Explain
3. What EDA Helps You Discover
| Goal |
Purpose |
| Outliers |
Detect unusual values |
| Distributions |
Check normality or skewness |
| Correlations |
Identify relationships |
| Data Quality |
Find errors or missing values |
| Transformations |
Suggest log or other transformations |
4. Common EDA Visualizations
- Histograms
- Boxplots
- Dot Plots
- Scatterplots
- Summary Statistics
Humans are excellent pattern recognizers.
5. Important Statistical Measures
| Measure |
Description |
| Mean |
Average value |
| Median |
Middle value |
| Mode |
Most frequent value |
| Range |
Max − Min |
| Variance |
Spread of values |
| Skewness |
Measure of asymmetry |
6. Quartiles and IQR
Quartiles divide sorted data into four equal parts.
- Q1 = 25th percentile
- Q2 = Median (50th percentile)
- Q3 = 75th percentile
Interquartile Range (IQR) = Q3 − Q1
IQR is commonly used to detect outliers.
7. EDA in a Nutshell
Always inspect every variable in your dataset.
- Look at distributions
- Identify unusual values
- Check assumptions
- Explore relationships
🧠Bottom Line:
Always look at your data before modeling it.
8. What is Interaction?
Interaction between people and machines requires mutual intelligibility or shared understanding.
Interactive visualization allows users to actively explore and manipulate visual representations of data.
9. Taxonomy of Interaction
| Category |
Techniques |
| Data & View Specification |
Visualize, Filter, Sort, Derive |
| View Manipulation |
Select, Navigate, Coordinate, Organize |
| Process & Provenance |
Record, Annotate, Share, Guide |
10. Data & View Specification
Allows users to determine what data is displayed and how it is displayed.
- Visualize
- Filter
- Sort
- Derive
🧠Choose the Data + Choose the View
11. View Manipulation
Allows users to interact directly with visual representations.
- Select
- Navigate
- Coordinate
- Organize
12. Process & Provenance
Supports tracking and communicating analysis progress.
- Record actions
- Annotate findings
- Share insights
- Guide exploration
13. Reorderable Matrix
Rows and columns can be reordered to reveal patterns and relationships.
Rearranging data often reveals hidden structures.
Used to discover relationships through permutation.
14. Matrix Files & Image Files
| Technique |
Purpose |
| Image File |
Represent ordered objects visually |
| Matrix File |
Handle very large dimensions |
Sorting is used to discover correlations.
15. Selection Techniques
| Type |
Example |
| Point Selection |
Mouse click, hover, tap |
| Region Selection |
Lasso, rubber-band selection |
🧠Point = One Item
Region = Many Items
16. Brushing and Linking
Selecting data in one view automatically highlights related data in another view.
Multiple visualizations become connected.
Select players with high salaries and see them highlighted in other baseball statistics charts.
One of the most important interaction techniques.
17. Linked Highlighting
Highlighting selected data across multiple visualizations simultaneously.
Helps compare patterns across different views.
18. Dynamic Queries
Interactive filtering where results update immediately as controls are adjusted.
Adjust a price slider and instantly see matching houses.
🧠Dynamic Queries = Instant Feedback
19. Problems with Text-Based Queries
- Rigid syntax
- Difficult for beginners
- Slow interaction cycle
- No guidance for reformulation
- Results often returned as tables
20. Direct Manipulation
Users interact directly with visual objects rather than typing commands.
- Point and click
- Drag and drop
- Immediate feedback
- Reversible actions
Easier and more intuitive than textual queries.
21. Dynamic Query Advantages & Disadvantages
| Pros |
Cons |
| Easy for beginners |
Limited query complexity |
| Fast exploration |
Many controls may clutter interface |
| Immediate feedback |
Screen space limitations |
22. Trellis Display
A framework that divides data into multiple panels based on categories.
Allows easy comparison across groups.
Split a scatterplot by gender or political affiliation.
23. Big Data Visualization
Interactive visualization must remain responsive even with billions of records.
Two Major Challenges:
- Effective visual encoding
- Real-time interaction
24. Big Data Techniques
| Technique |
Purpose |
| Sampling |
Use subset of data |
| Binning |
Group values together |
| Modeling |
Represent patterns efficiently |
| Aggregation |
Summarize large datasets |
🧠Don't visualize billions of records directly.
Summarize first.
25. Final Exam Summary
Most Important Points
- EDA: Explore data before visualization.
- Tukey (1977): Introduced EDA.
- EDA Goals: Find distributions, outliers, correlations and quality issues.
- Taxonomy of Interaction: Data & View Specification, View Manipulation, Process & Provenance.
- Selection: Point selection and region selection.
- Brushing & Linking: Highlight related data across views.
- Dynamic Queries: Interactive filtering with immediate feedback.
- Direct Manipulation: Pointing instead of typing.
- Trellis Displays: Compare categories using multiple panels.
- Big Data: Use sampling, binning, modeling and aggregation.